Group 009E05
The University of Sydney
V.S.
Performance Metrics
After 10-fold CV:
\[\begin{align} \widehat{\text{quality}} = &154.106 + 0.068(\text{fixed.acidity}) -\\ &1.888(\text{volatile.acidity}) + 0.083(\text{residual.sugar}) +\\ &0.003(\text{free.sulfur.dioxide}) - 154.291(\text{density}) +\\ &0.694(\text{pH}) + 0.629(\text{sulphates}) + 0.193(\text{alcohol})\\ \end{align}\\ \;\\\] \[\begin{array}{c|cccc} & \textrm{RMSE} & \textrm{MAE} & R^2 & \textrm{AIC}\\ \hline \textrm{Stepwise Select} & 0.753 & 0.585 & 0.278 & 11171.41\\ \end{array} \;\\\]Forward and backward select chose the same model! But what about multicollinearity? We can check with VIF.
\[VIF_i = \frac{1}{1-R^2_i}\]
OK
Not OK
Remove “density”! But what if there was a better way?
\[\beta^{lasso}_\lambda = \underset{\beta}{\operatorname{\arg\max}} \Biggl\{ \underbrace{\sum_{i=1}^n\Biggl( y_i-\beta_0-\sum_{j=1}^p\beta_jx_{ij}\Biggr)^2}_{\text{Residual Sum of Squares}\; (RSS)}+\lambda\sum_{j=1}^p|\beta_j| \Biggr\}\]
\[\beta^{lasso}_\lambda = \underset{\beta}{\operatorname{\arg\max}} \Biggl\{ \underbrace{\sum_{i=1}^n\Biggl( y_i-\beta_0-\sum_{j=1}^p\beta_jx_{ij}\Biggr)^2}_{\text{Residual Sum of Squares}\; (RSS)}+\lambda\sum_{j=1}^p|\beta_j| \Biggr\}\]
10-fold CV gave us \(\log\lambda = -5.976\) or \(\lambda = 0.00254\). This gives:
\[\begin{align} \widehat{\text{quality}} = &2.732 - 0.039(\text{fixed.acidity}) -\\ &1.751(\text{volatile.acidity}) + 0.016(\text{residual.sugar}) -\\ &0.003(\text{chlorides}) -0.548(\text{free.sulfur.dioxide}) +\\ &0.036(\text{pH}) + 0.202(\text{sulphates}) + 0.335(\text{alcohol})\\ \end{align}\\ \;\\\] \[\begin{array}{c|cccc} & \textrm{RMSE} & \textrm{MAE} & R^2 & \textrm{AIC}\\ \hline \textrm{Stepwise Select} & 0.753 & 0.585 & 0.278 & 11171.41\\ \textrm{LASSO} & 0.751 & 0.584 & 0.281 & 11173.49\\ \end{array}\]Nice! We improved our model. However, as our predictor is a discrete variable, it may make more sense to use a different type of regression.
Our independent variable “quality” is an ordinal variable. We utilise the log-odds also known as the logit. Lets say we have \(J\) categories:
\[ P(Y\leq j) \]
\[\log\Biggl(\frac{P(Y\leq j)}{P(Y> j)}\Biggr) = \text{logit}(P(Y\leq j))\] \[\text{logit}(P(Y\leq j)) = \beta_0 + \sum_{i=1}^{J-1}(-\beta_jx_j)\] {.absolute bottom=“0” width=“320” height=“200” right = “0”}
*Nagelkerke’s Pseudo \(R^2\)